Functional modes of proteins are among the most robust ones 
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It is shown that a small subset of modes which are likely to be involved in protein functional 
motions of large amplitude can be determined by retaining the most robust normal modes obtained 
using different protein models. This result should prove helpful in the context of several applications 
proposed recently, like for solving difficult molecular replacement problems or for fitting atomic 
structures into low-resolution electron density maps. Moreover, it may also pave the way for the 
development of methods allowing to predict such motions accurately. 
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In the case of two-domain proteins, it is well known 
that a few low-frequency normal modes can provide a 
fair description of their large amplitude motion upon lig- 
and binding 0, 0, Q ■ More recently, it has been shown 
that this is also true for proteins with more complex 
architectures^ 0, 0|, as long as their functional motion 
is a collective one, i.e. if it concerns large parts of the 
structure 0, || For instance, a single low-frequency 
mode of the T form of hemoglobin is enough to de- 
scribe accurately its conformational change upon oxygen 
binding [jj. 

This result has been successfully applied for exploit- 
ing fiber diffraction dataLfJ, solving difficult molecular 
replacement problems [Til 1 1 2l fl3| . or fitting atomic struc- 
tures into low-resolution electron density maps[l3L Ibl 
Il5j . The principle of these applications is to perturb 
a known structure along its low-frequency modes so as 
to get a deformed structure that is consistent with low- 
resolution biophysical data, which are obtained after the 
protein has undergone some large amplitude conforma- 
tional change. It was also shown that when variations 
of a few key distances are known, through spectroscopic 
measurements for instance, it is possible, using linear re- 
sponse theory, to identify which modes are the most in- 
volved in the conformational change^l ^J. However, 
if such experimental data are missing, it is difficult to 
guess which low- frequency modes arc the functional ones. 
Hereafter, we show that they are among the most robust 
ones, i.e. among the most conserved modes when differ- 
ent descriptions of a given protein are considered. The 
robustness of the functional modes was recognized when 
it was shown that they can be obtained^ H, 13 with 
simple protein descriptions, like Elastic Network (EN) 
models^! 0, [53. Herein, this property is used so as to 
identify them. 

First, standard normal modes were calculated for a 
set of five proteins of different sizes and architectures 
after preliminary energy-minimization. The CHARMM 
program[2l| was used, with the EEF1.1 implicit sol- 
vent model and the corresponding electrostatic and non- 
bonded options [22^ . as done in recent studies performed 
at this level of detail [23. Then, for each energy- 
minimized structure, low-frequency normal modes were 



calculated with the all-atom EN model proposed by M. 
TirionfTsj]. where the standard, many-parameters, empir- 
ical energy function E p used in programs like CHARMM 
is replaced by: 



e p = J2 c(^-4) s 



(1) 



where dij is the distance between atoms i and j, being 
their distance in the studied structure. The strength of 
the potential C is a constant assumed to be the same 
for all interactings pairs. It is required only in order to 
define energy (and frequency) units. As done in previous 
studies [TT| . R c , the cut-off parameter, is set to 5A. 



In order to compare both sets of normal modes, 



eff 



the effective number of EN modes involved in the de- 
scription of standard mode i, is calculated as follows j24|: 



eff 



exp( 



2 ln«)) 



where n is the number of EN modes taken into account 
(n=100 herein), being the scalar product between 
standard mode i and EN mode j. The normalization 
factor a is such that: J2 a ^ij = 1- Thus, gives the 
effective number of non-zero (normalized) . It ranges 
from 1 to n. As shown in Fig. ^ for each protein con- 
sidered, several of its standard normal modes can be de- 
scribed accurately with less than 5-6 EN modes. More- 
over, all these robust modes have low rankings, namely, 
below #15. 

Next, two other EN models were considered. In both 
cases, as often done, 0, HI H?! H3 only C a atoms 
are kept. In the first model, as proposed by M. Tirion 
[see Eq. JIJ], pairs of interacting neighbors are deter- 
mined according to a distance-cutoff criterion, namely, 
R c = 12A. With such a criterion, for Adenylate Kinase, 
n c , the average number of interacting neighbors per C a 
atom, is 25 ± 7, ranging from 10 to 42, as a function of 
the degree of burial of the amino-acid in the protein in- 
terior. Note that R c can not be set to a value lower than 
8-10A, a limit which depends upon the structure con- 
sidered. Otherwise, the number of zero-frequency modes 
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Figure 1: Effective number of EN normal modes involved 
in the description of each standard mode of five proteins. 
Cross: Lysozyme T4 (pdb code 1781). Plus: Adenylate Ki- 
nase (4ake). Open square: Glutamin Binding Protein (lggg). 
Filled square: LAO Binding Protein (21ao). Open circle: 
DNA Polymerase j3 (lbpx). Modes are ranked according to 
increasing frequencies. Modes ranked 1 to 6 correspond to 
the six, zero-frequency, rigid-body translations and rotations 
of each protein. 



becomes larger than six, as a consequence of the splitting 
of the elastic network into several independant ones. 

The second model was designed so as to keep n c as 
constant as possible from one amino-acid to the other. To 
do so, we use the following algorithm. First, all pairs of 
C Q atoms are sorted, according to their distance. Then, 
starting from the pair separated by the largest distance, 
they are removed one after the other, unless one atom of 
the pair has already n c neighbors. With this algorithm, 
setting n c = 10, the average distance between pairs of 
interacting neighbors is 6.2 ± 1.8A, ranging from 3.0 to 
10. 8A. Note that in the case of Adenylate Kinase n c can 
be set to a value as low as 7 without splitting the network 
into independant ones. 

As done above, normal modes obtained with both EN 
models were compared, seeking for robust ones, using a 
set of twenty-two proteins considered in previous stud- 
ies performed with the distance-cutoff criterion^ Isl IITI| . 
Like in the case of all-atom models, modes arc considered 
to be robust whenever < 6. 

Statistics of the number of robust modes found for all 
studied proteins are shown in Fig. |2 (zero-frequency 
modes are not taken into account). In most cases, 
the number of robust modes is four or less. In only 
three cases, it is larger than seven. Interestingly, the 
DNA polymerase of bacteriophage RB69 (pdb code lih7), 
which is the protein of our dataset with the largest num- 
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Figure 2: Number of robust normal modes found by com- 
paring modes obtained with different protein models. For a 
first set of five proteins, standard modes were compared to 
modes obtained with an all-atom EN model. For a second 
set of twenty-two proteins, modes obtained with two different 
C a -EN model were compared. Modes are considered to be 
robust when they can be described accurately with at most 
six modes obtained with another protein model. 



ber of robust modes (eleven), has a quite complex ar- 
chitecture, with three well-known structural domains. It 
is also among the largest cases considered herein (897 
amino-acids) . 

In four cases, no robust mode is found. Interest- 
ingly, the known conformational change of these pro- 
teins, namely, Tyrosine Phosphatase, Triose Phosphate 
Isomerase, Che Y, and HIV-1 protease (pdb codes are 
lyts, 3tim, 3chy, lhhp, respectively), is a small amplitude 
one, with a C a root-mean-squarc displacement (r.m.s.d.) 
of 1.5A at most. 

Then, it was checked that robust modes yield accu- 
rate descriptions of protein functional motions. To do so, 
Qd, the_cmality of the motion description is calculated as 
follows 



: qua 



Q d = 100 <J2lf d 



where n is the number of modes taken into account in the 
description and lid is the scalar product between mode i 
and the direction of the conformational change observed 
by crystallographers. Note that Qd = 100% when all 
modes are included in the description, since they form a 
complete basis setprjj. 

In Fig. |3 the conformational change of lactoferrin is 
shown. It can be described accurately (Qd over 85%) as 
a linear combination of the seven lowest-frequency modes 
of the "open" form (see Fig. 0}. Interestingly, all seven 



Figure 3: The conformational change of Lactoferrin upon 
ligand binding. Left: apo (or "open") state (pdb code lcb6). 
Right: holo (or "closed") state (llfg) . In the latter case, the 
iron ligands are not shown. Drawn with Molscript|2^|. 



modes arc found to be robust. In Fig. Qd is given as a 
function of the amplitude of the functional motion of each 
protein considered when n = 100 normal modes or when 
only the robust ones are taken into account in the descrip- 
tion. For most proteins with small amplitude motions, 
i.e. of less than 2-3A of r.m.s.d., robust modes fail to 
capture any information about the nature of the known 
conformational change, while in several cases some in- 
formation is indeed present in the normal modes. For 
instance, as mentioned above, for HIV-1 protease, no ro- 
bust mode is found, although a single EN mode is enough 
for describing 50% of its conformational change upon lig- 
and bindingQ. If two other EN modes are added to the 
description, Qd can reach a value of 77% (with n=100, 
Q d =89%). ' 

On the other hand, when considering proteins with 
large amplitude motions, the description of the confor- 
mational change with robust modes is almost as accurate 
(Qd over 75%) as when n = 100 normal modes are taken 
into account. The only counter exemple is Adenylate Ki- 
nase, whose r.m.s.d. upon ligand binding is 5.3A (the 
corresponding pdb codes of the open and closed crystal- 
lographic structures are 4ake and lake). As a matter of 
fact, when standard normal modes of Adenylate Kinase 
are compared to all-atom EN ones, only a single robust 
mode is found (see Fig. QJ, and it is not involved in the 
conformational change (Qd=4%). However, using C Q - 
EN models, six robust modes are found and they allow 
for an almost perfect description of the conformational 
change (Q d =9l%). 

Of course, when using all atom models, more robust 
modes can be obtained by raising the robustness crite- 
rion. In the case of Adenylate Kinase, if a given mode is 
said robust whenever n| < 10, then five robust modes 
are found. However, it is still not enough (Qd=73%) 
for describing its conformational change as well as with 
robust modes obtained using C Q -EN models. Raising 




0123456789 
Normal mode frequency (cm-1) 



Figure 4: Quality of the description of the closure motion 
of Lactoferrin upon ligand binding, as a function of the num- 
ber of low-frequency normal modes (black points) considered. 
Boxes: contribution of each robust mode to the description. 



the robustness criterion so as to obtain six robust modes 
does not change significantly the quality of the descrip- 
tion (Qd=77%). As a matter of fact, robust modes ob- 
tained using all-atom models always yield poorer descrip- 
tion of protein functional motions than simpler models, 
in which only C a atoms are kept (open circles are below 
open squares in Fig. [SJ. This is mainly due to the fact 
that standard normal mode analysis requires a prelimi- 
nary energy-minimization, during which the structure is 
significantly distorted, while normal mode analysis of EN 
models does not, as illustrated by the case of DNA poly- 
merase (3. For this protein, when the C Q -EN model is 
built using the crystal structure (pdb code lbpx), seven 
robust modes are found, which are able to describe ac- 
curately (Qd=SA%) the conformational change upon nu- 
cleotide binding (pdb code lbpy). However, when it is 
built using the energy-minimized structure, only three ro- 
bust modes are found, which are not able to describe the 
conformational change (<3d=21%) much better than the 
three ones obtained using all-atom models (Qd=l(>%)- In 
that case, the distortion during the energy-minimization 
process is unusually large (r.m.s.d.=2.5A), probably as a 
consequence of the removal of the large ligand, namely, 
a sixteen base pair DNA (lbpx is the structure of a bi- 
nary complex while lbpy is the structure of a ternary 
complex), prior to the calculation. Even though the am- 
plitude of the distortion is almost as large as the ampli- 
tude of the functional motion itself (r.m.s.d. =2. 8A), the 
above result is not straightforward, since the distortion 
does not occur along the direction of the conformational 
change. Indeed, with respect to the energy-minimized 
structure, the amplitude of the functional motion remains 
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Figure 5: Quality of the description of protein functional mo- 
tions with 100 low-frequency modes (filled symbols) or with 
only the robust ones (open symbols), as a function of the 
amplitude of the motion. Five proteins were studied at the 
all-atom level (circles) and the other ones at the amino-acid 
level (squares). 



large (r.m.s.d.=2.4A). 

In the present study, modes obtained with standard 
all-atom, many parameters, protein models were com- 
pared to those obtained with elastic network models, 
as proposed by M. Tirion0|. For most protein cases, 
several robust modes are found, confirming results ob- 
tained previously^, |3[7j2l2i3 , namely, that the lowest- 
frequency modes are little sensitive to details in the 
protein description. Since such EN models rely on a 



distance-cutoff criterion for defining atomic interactions, 
this can be explained in two different ways. First, these 
modes may capture informations about the protein mass 
distribution in space. Second, they may capture informa- 
tions about the rigidity of the protein in the vicinity of 
each amino-acid residue. Indeed, with a distance-cutoff 
criterion, amino-acids in the protein interior are more 
rigid (more neighbors) than those on the surface (less 
neighbors). So, we designed a novel C Q -EN model whose 
main raison d'etre was to decide between these two possi- 
bilities. In this model, each C a atom has a given number 
of interacting neighbors and rigidity is fairly constant 
from one point of a protein to another. When modes ob- 
tained with this model are compared to those obtained 
with a C Q -EN model based on the distance-cutoff crite- 
rion, the same robust modes are found. This means that 
they are also not sensitive to the distribution of rigidity 
in the protein. 

Moreover, we have shown that these robust modes are 
likely to be involved in protein functional motions, at 
least when the functional motion is a large amplitude 
one (r.m.s.d. > 2-3 A). This result should prove helpful 
in the context of applications like those mentioned in 
the Introduction, since they all concern large amplitude 
conformational changes [Til 111 ITU HI ITU . 

This result could also pave the way for the develop- 
ment of methods allowing to predict such motions accu- 
rately, i.e. to predict their amplitude, since exploring a 
subspace of small dimensionality (three or four in most 
cases considered) should be enough for finding conforma- 
tions close to functional ones. Interestingly, seeking for 
robust modes could also indicate whether a given pro- 
tein can exhibit large amplitude functional motions or 
not. Indeed, the functional motions of the four proteins 
with no robust mode are small amplitude ones. 
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